CLARIT TREC-8 Experiments in Searching Web Data

نویسندگان

  • Jeffrey Bennett
  • Xiang Tong
  • David A. Evans
چکیده

CLARITECH submitted two baseline content-only runs and completed two additional content+link runs in the TREC-8 Web Track. These represent our first serious attempt to deal with Web data, and our first automatic runs in several years. The first question was whether CLARIT would perform as well on Web data as on more traditional text. We found that, with extensive pre-processing of the raw data prior to indexing, the automatic retrieval system actually performed better on Web data than on Ad Hoc data. For the link runs, we implemented a version of the HITS algorithm [Kleinberg 1997], originally developed at IBM. Our version optimized HITS for the CLARIT environment, but also reflected some constraints imposed by limited resources. Unable to develop and sufficiently test our own matrix-processing library in time, we used a commercial product for the number crunching. Performance on the link runs was poor, but failure analysis suggests many ways to improve it.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Evaluation of the CLARIT-TREC-2 System

All of the results we report in this paper follow from straightforwardapplications of base-level CLARIT processing, utilizing essentially the same CLARIT components that were employed in the CLARIT–TREC1 system. The general improvements we observe in CLARIT–TREC-2 processing are attributable tomodifications (especially simplifications) in processing steps and in the settings of system variables...

متن کامل

CLARIT Compound Queries and Constraint-Controlled Feedback in TREC-5 Ad-Hoc Experiments

A fundamental problem for searching over large databases in ad-hoc mode is the formulation of an effective initial query that is both comprehensive and focused. The query needs to be comprehensive enough to retrieve, on its own or enhanced by various automatic feedback techniques, relevant documents that possibly address different aspects of the topic. At the same time, it has to be focused eno...

متن کامل

CLARIT TREC-8 CLIR Experiments

In the TREC-8 cross-language information retrieval (CLIR) track, we adopted the approach of using machine translation to prepare a source-language query for use in a target-language retrieval task. We empirically evaluated (1) the effect of pseudo relevance feedback on retrieval performance with two feedback vector length control methods in CLIR and (2) the effect of multilingual data merging e...

متن کامل

Evaluation of Syntactic Phrase Indexing -- CLARIT NLP Track Report

The CLARIT NLP track e ort is focused on evaluating the usefulness of syntactic phrases for document indexing. The CLARIT system has several NLP techniques integrated with the vector space retrieval model [Evans et al. 91, Evans et al. 95]. The NLP techniques used in CLARIT include morphological analysis, robust noun-phrase parsing, and automatic construction of rst order thesauri, among others...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999